Data Transfers in Hadoop: A Comparative Study

نویسندگان

Ujjal Marjit

Kumar Sharma

Puspendu Mandal

چکیده

Hadoop is an open source framework for processing large amounts of data in distributed computing environment. It plays an important role in processing and analyzing the Big Data. This framework is used for storing data on large clusters of commodity hardware. Data input and output to and from Hadoop is an indispensable action for any data processing job. At present, many tools have been evolved for importing and exporting Data in Hadoop. In this article, some commonly used tools for importing and exporting data have been emphasized. Moreover, a state-of-the-art comparative study among the various tools has been made. With this study, it has been decided that where to use one tool over the other with emphasis on the data transfer to and from Hadoop system. This article also discusses about how Hadoop handles backup and disaster recovery along with some open research questions in terms of Big Data transfer when dealing with cloud-based services. TYPE OF PAPER AND

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Sentiment Analysis of Social Networking Data Using Categorized Dictionary

Sentiment analysis is the process of analyzing a person’s perception or belief about a particular subject matter. However, finding correct opinion or interest from multi-facet sentiment data is a tedious task. In this paper, a method to improve the sentiment accuracy by utilizing the concept of categorized dictionary for sentiment classification and analysis is proposed. A categorized dictiona...

متن کامل

Cluster Computing Paradigms– A Comparative study of Evolving Frameworks

Cluster computing is an approach for storing and processing huge amount of data that is being generated. Hadoop and Spark are the two cluster computing platforms which are prominent today. Hadoop incorporates the MapReduce concept and is scalable as well as fault-tolerant. But the limitations of Hadoop paved way for another cluster computing framework named Spark. It is faster and can also mana...

متن کامل

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

A Comparative Study of Hadoop-based Big Data Architectures

Big Data is a concept popularized in recent years to reflect the fact that organizations are confronted with large volumes of data to be processed and this, of course, presents a strong commercial and marketing challenge. This trend around the analysis and collection of Big Data has given rise to new solutions that combine traditional data warehouse technologies with Big Data systems in a logic...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

OJBD

دوره 1 شماره

صفحات -

تاریخ انتشار 2015

Data Transfers in Hadoop: A Comparative Study

نویسندگان

چکیده

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Sentiment Analysis of Social Networking Data Using Categorized Dictionary

Cluster Computing Paradigms– A Comparative study of Evolving Frameworks

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

A Comparative Study of Hadoop-based Big Data Architectures

عنوان ژورنال:

اشتراک گذاری